Citation : James Dundas and Trevor Mudge . Improving data cache performance by pre - exe
نویسندگان
چکیده
In this paper we propose and evaluate a technique that improves first level data cache performance by pre-executing future instructions under a data cache miss. We show that these preexecuted instructions can generate highly accurate data prefetches, particularly when the first level cache is small. The technique is referred to as runahead processing. The hardware required to implement runahead is modest, because, when a miss occurs, it makes use of an otherwise idle resource, the execution logic. The principal hardware cost is an extra register file. To measure the impact of runahead, we simulated a processor executing five integer Spec95 benchmarks. Our results show that runahead was able to significantly reduce data cache CPI for four of the five benchmarks. We also compared runahead to a simple form of prefetching, sequential prefetching, which would seem to be suitable for scientific benchmarks. We confirm this by enlarging the scope of our experiments to include a scientific benchmark. However, we show that runahead was also able to outperform sequential prefetching on the scientific benchmark. We also conduct studies that demonstrate that runahead can generate many useful prefetches for lines that show little spatial locality with the misses that initiate runahead episodes. Finally, we discuss some further enhancements of our baseline runahead prefetching scheme.
منابع مشابه
Improving Performance and Energy Consumption in Region-based Caching Architectures
Embedded systems must simultaneously deliver high performance and low energy consumption. Meeting these goals requires customized designs that fit the requirements of the targeted applications. This philosophy of tailoring the implementation to the domain applies to all subsystems in the embedded architecture. For the memory system, which is a key performance bottleneck and a significant source...
متن کاملImproving Cache Performance Via Active Management
ii To Mom and Dad, for everything. iii ACKNOWLEDGEMENTS Many people have made significant impacts in my life throughout my graduate school career. While I would be wont for space to individually thank each of the people who have touched my life during this time, I would like to express my heartfelt thanks to the following people for their great love, support, encouragement, and belief in me thr...
متن کاملThe Effect of Speculative Execution on Cache Performance
Superscalar microprocessors obtain high performance by exploiting parallelism at the instruction level. To effectively use the instruction-level parallelism found in general purpose, non-numeric code, future processors will need to speculatively execute far beyond instruction fetch limiting conditional branches. One result of this deep speculation is an increase in the number of instruction and...
متن کاملCrosspoint Cache Architectures
We propose a new architecture for shared memory multiprocessors, the crosspoint cache architecture. This architecture consists of a crossbar interconnection network with a cache memory at each crosspoint switch. It assures cache coherence in hardware while avoiding the performance bottlenecks associated with previous hardware cache coherence solutions. We show this architecture is feasible for ...
متن کاملHardware Support for Hiding Cache Latency
As the decrease in processor cycle time continues to outpace the decrease in memory cycle time, even moderately sized on-chip caches may require several cycles of access time in the near future. This means that time is lost, even on a cache hit, if independent instructions cannot be scheduled after a read from memory. A novel hardware device is proposed that keeps track of the history of load i...
متن کامل